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ARI  Research  Reports  and  Technical  Papers  are  intended  for  sponsors  of 
R&D  tasks  and  other  research  and  military  agencies.  Any  findings  ready  for 
implementation  at  the  time  of  publication  are  presented  in  the  latter  part  of 
the  Brief.  Upon  completion  of  a major  phase  of  the  task,  formal  recommen 
dations  for  official  action  normally  are  conveyed  to  appropriate  military 
agencies  by  briefing  or  Disposition  Form. 


Improvement  in  the  efficiency  and  economy  of  individual 
enlisted  training,  evaluation,  and  utilization  is  essential  to 
maintain  maximum  combat  readiness  of  the  Army,  and  is  a major 
concern  of  the  Individual  Training  & Skill  Evaluation  Technical 
Area  of  the  Army  Research  Institute  for  the  Behavioral  and  Social 
Sciences  (ARI).  The  present  Army  policy  emphasizes  performance- 
based  training  and  testing;  ARI  research  has  made  possible  the 
development,  validation,  and  application  of  performance-based, 
criterion-referenced  Skill  Qualification  Tests  (SQTs)  as  well  as 
self-contained  procedures  by  which  Army/Test  Development  Agencies 
can  construct  and  validate  the  SQTs. 


The  present  report  discusses  the  SQT  program,  its  principles 
of  test  construction,  and  the  benefits  expected  in  its  utiliza- 
tion. Research  was  accomplished  under  Army  Project  2Q762722A764, 
and  is  directly  responsive  to  the  requirements  of  the  Individual 
Training  and  Evaluation  Directorate  (ITED)  of  the  Army  Training 
Support  Center,  Fort  Eustis,  Virginia. 
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Requ I remen  t ; 

Army  training  and  personnel  management  requires  Job  per- 
formance tests  that  are  fair  to  all  sold  ' srs,  feasible  for 
worldwide  administration,  and  measure  performance  on  critical 
job  tasks. 


Procedure ; 

Procedures  for  developing  Skill  Qualification  Tests  (SQTs) 
were  prepared  and  tried  out  by  Army  test  development  agencies. 
The  procedures  cover  assuring  that  the  tests  have  content  valid- 
ity and  verifying  that  the  tests  are  accurate  measures  of 
performance. 


Results : 

• Procedures  for  developing  crltei Ion-referenced , perform- 
ance-based evaluations  of  task  performance. 

• Procedures  for  determining  accuracy  of  the  tests  as 
measures  of  performance. 

• Guidelines  and  self-lnstructlonsl  materials  for  developing 
SQTs.  The  procedures  are  designed  to  assure  that  the 
tests  are  based  on  realistic  Job  requirements  and  that  the 
scores  reflect  successful  task  performance  (that  Is,  they 
are  criterion  referenced).  The  general  test  content, 
therefore,  can  be  open  knowledge,  and  subsequent  nanag- 
ement  decisionmaking  can  be  based  on  how  well  soldiers 
attain  performance  standards. 


Utilization: 

The  procedures  for  constructing  and  validating  Skill  Qualifi- 
cation Tests  are  In  use  for  developing  more  than  1000  testa  for 
evaluating  Job  proficiency  In  the  Army  enlisted  force.  The 
guidelines  and  self-instructional  materials  are  used  to  train 
personnel  a^t  the  more  than  thirty  Test  Oevelopraent  Agencies  on 
how  to  develop  SQTs. 
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CRITERION-REFERENCED  JOB  PROFICIENCY  TESTING.  A LARGE  SCALE 
APPLICATION 


OVERVIEW 

Skill  Qualification  Tests  (SQT)  have  been  developed  to  replace 
Military  Occupational  Specialty  (MOS)  proficiency  tests  as  measures  of 
ability  to  perforin  Army  enlisted  Jobs.  SQTs  are  performance-based,  cri- 
terion-referenced measures  of  job  proficiency,  consisting  of  precisely 
defined  tests  of  tasks,  all  of  which  are  critical  and  necessary  to 
performance  of  the  job.  The  criterion-referenced  approach  provides  an 
explicit  relationship  between  job  requirements  and  test  content  In  that 
job  requirements  dictate  content  of  SQTs.  The  SQT  development  process 
requires  that  testa  be  reviewed  by  subject  matter  experts  and  validated 
on  representative  job  incumbents  to  assure  that  test  content  Is  job 
relevant.  Test  standards  of  acceptable  levels  of  performance  are  also 
based  on  job  requirements  and  test  content.  Performance  standards  are 
based  on  behavlorally  derived  absolute  scoring  standards,  and  are  not 
based  on  performance  relative  to  other  soldiers  who  take  the  test.  For 
these  reasons  SQTs  are  justifiably  viewed  as  criterion-referenced  tests 
of  job  proficiency. 

A criterion-referenced  testing  system  offers  two  significant  advan- 
tages not  available  in  traditional  testing  programs.  One  is  that  test 
content  can  be  made  public  in  advance  of  administration.  There  are  no 
reasons  to  keep  test  content  secret  in  a testing  program  based  on 
explicit  linkages  between  teat  content  and  job  requirements.  Advance 
knowledge  of  test  content  results  in  an  equitable  and  open  system. 
Everyone  has  an  equal  opportunity  to  acquire  proficiency  on  the  specific 
job  tasks  known  to  be  included  in  the  test. 

The  second  is  that  a criterion-referenced  approach  allows  personnel 
management  decisions  such  as  promotion,  selection,  and  advanced  school- 
ing to  be  based  on  performance  standards  instead  of  personnel  quotas. 

In  more  complicated  situations  involving  the  merging  or  splitting  of  job 
specialties  at  higher  skill  levels,  soldiers  from  different  specialties 
can  be  compared  on  the  basis  of  their  levels  of  competence  instead  of 
their  relative  standing  in  the  testing  group.  Criterion-referenced 
testing  of  job  proficiency  has  opened  new  opportunities  for  both  training 
and  personnel  management. 


BACKGROUND 

The  Army  has  been  using  tests  to  measure  job  proficiency  for  over 
15  years.  These  tests,  called  Military  Occupational  Specialty  (MOS) 
proficiency  tests,  were  designed  primarily  to  help  personnel  managers  in 
making  decisions  of  vital  importance  to  individuals*  careers,  such  as 
proficiency  pay,  promotion,  and  assignments.  Ttie  MOS  tests  were  tradi- 
tional achievement  tests,  consisting  of  125  multiple  choice  items,  each 
with  four  alternatives.  The  test  content  was  related  generally  to  the 
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domain  of  job  performance,  but  there  was  no  definitive  logical  correspon- 
dence between  test  Items  and  specific  Job  requirements.  Kach  item  was 
scored  pass-fall;  the  total  score  was  the  number  of  Items  correct,  and 
the  total  score  was  then  used  to  rank  order  persons  In  each  Job  specialty. 
Therefore,  any  referencing  of  test  score  to  test  content  was  immediately 
abandoned. 

While  such  proficiency  tests  have  use  in  personnel  management  deci- 
sions, they  did  not  fully  serve  the  Army  training  needs.  Because  of 
content  limitations,  lack  of  con' - nt-score  correspondence,  minimal  diag- 
nostic utility,  and  the  long  delay  in  providing  feedback  to  the  field 
(up  to  one  year  after  testing).  Army  trainers  did  not  find  MOS  tests 
particularly  useful  for  determining  training  requirements,  measuring 
individual  and  unit  performance,  and  defining  training  readiness. 

Army  training  during  this  same  period,  especially  in  the  late  1960's 
and  early  I970's,  was  undergoing  a major  revolution.  Perf orrance-based 
training  and  testing,  based  on  critical  job  tasks  and  criterion-refer- 
enced standards  of  performance,  were  being  Implemented  In  entry-level 
training  courses.  Training  objectives  were  operationally  defined  by  the 
performance  tests  given  during  the  course,  and  tests  were  made  public  to 
students  as  well  as  Instructors.  The  content  of  these  tests  was  always 
directly  relevant  to  the  Job.  The  tests  themselves  were  used  to  drive 
the  direction  of  training. 

Tests,  because  of  their  function  In  maintaining  accountability, 
are  effective  Instruments  In  bringing  about  Institutional  change.  Test 
content  helps  implement  doctrine  about  the  way  Jobs  are  to  be  performed, 
and  Is  helpful  In  defining  training  requirements  and  standards.  The  pub- 
lic nature  of  the  tests  helps  focus  attention  on  the  critical  elements  of 
the  Job,  enables  effective  use  of  soldiers'  time  In  preparing  for  tests, 
and  thus  improves  individual  readiness. 

So  Impressive  was  the  success  of  performance-based  training  and 
testing  that  the  Army  made  the  policy  decision  to  change  from  the  exist- 
ing mode  of  Job  proficiency  testing,  typically  referred  to  as  "norm- 
referenced,  paper-and-pencll  testing,"  to  the  criterion-referenced  mode 
of  proficiency  testing.  These  new  criterion-referenced  tests,  called 
Skill  (Qualification  Tests  (SQT),  are  having  a profound  impact  on  the 
entire  Army  community.  The  new  testing  procedures  are  forcing  training 
managers,  personnel  managers,  and  research  support  personnel  to  rethink 
and  often  redefine  their  functions. 


REQUIREMENTS  FOR  SKILL  QUALIFICATIONS  TESTS 

The  basic  requirement  of  SQTs  Is  that  the  tests  are  Job  relevant. 
The  test  content  must  be  based  on  Job  requirements,  and  the  test  scores 
must  be  accurate  measures  of  ability  to  perform  critical  Job  tasks. 
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"nio  Job  relevance  of  SQTs  Is  assured  by  basing  them  on  Soldier's 
Manuals.  Soldier's  Manuals  contain  the  critical  Job  tasks,  the  behaviors 
required  to  perform  the  tasks,  the  Job  conditions,  and  the  standards 
of  performance.  Soldier's  Manuals  define  the  Jobs  In  that  they  list  all 
the  tasks  soldiers  in  a Job  specialty  are  responsible  for  performing,. 
Since  SQTs  are  based  on  Soldier's  Manuals,  the  SQTs  are  Job  relevant. 


PFRFORMANCK  INFORMATION  FOR  TRAINING  ANO  PF.RSONNF.l.  MAWAGFMENT 

SQTs  are  used  by  both  training  and  personnel  management  to  help  make 
Important  decisions  affecting  the  career  development  of  soldiers.  Both 
training  and  personnel  management  need  timely  and  accurate  Information 
about  how  well  individuals  are  performing;  training  management  to  deter- 
mine training  requirements  of  individuals,  and  personnel  management  to 
help  determine  who  to  promote,  reclassify,  or  reassign.  Although  both 
training  and  personnel  management  have  a need  for  tl\e  same  kind  of 
Information,  their  Immediate  requirements  are  not  identical. 

Training  managers  base  their  Immediate  training  requirements  on  the 
specific  tasks  performed  in  their  units.  The  job  relevance  of  tests  for 
specific  assignments,  therefore.  Is  the  primary  consideration  from  this 
point  of  view  and  it  Is  defined  In  terms  of  the  tasks  that  so'diers  per- 
form In  their  assignments.  The  set  of  tasks  performed  in  an  assignment 
is  generally  a subset  of  the  tasks  required  In  a specialty.  Tlie  task  is 
a convenient  unit  for  determining  training  requirements  because  tasks  are 
observable,  have  initiating  and  terminating  cues,  and  have  standards  of 
performance  that  can  be  reasonably  well  specified.  Decisions  about 
proficiency  can  be  made  at  the  task  level,  and  training  managers  can 
identify  the  specific  tasks  on  which  soldiers  need  training.  If  the 
test  measures  performance  on  the  specific  tasks  for  which  the  training 
managers  have  responsibility,  then  the  tests  are  serving  their  basic 
purpose. 

Personnel  managers  are  also  concerned  with  the  job  performance  of 
individual  soldiers;  but  rather  than  focusing  on  soldiers'  specific 
assignments,  personnel  managers  need  to  know  how  well  soldiers  can 
perform  all  the  tasks  in  a specialty.  For  example,  performance  in  a 
specialty^,  such  as  Infantryman  or  Wheeled  Vehicle  Mechanic,  cannot 
necessarily  be  Inferred  from  the  set  of  tasks  found  in  any  one  assign- 
ment. Personnel  managers,  therefore,  have  a need  for  information  based 
on  a standard  set  of  tasks  for  each  specialty.  All  soldiers  in  a 
specialty  need  tc  be  evaluated  on  the  same  set  of  tasks  to  enable  fair 

decisions  about  which  soldiers  to  promote,  retain,  or  reclassify.  The 

need  for  a standard  set  of  tasks  in  each  specialty  imposes  additional 
testing  requirements  for  feasibility  and  acceptability.  Tlie  test  scores 
should  not  be  affected  by  when  or  where  the  test  is  taken,  nor  by  whom 

it  is  administered  and  scored.  Tlie  testing  conditions,  as  well  as 

performance  standards,  should  be  standardized. 
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The  requirement  for  Army~wide  standardization  at  the  present  state  of 
itie  art  in  testing  means  that  initially  most  of  the  test  content  is  in 
tile  paper-and-penc i 1 mode  rather  than  hands-on  performance  tests.  Paper 
and  pencil  tests  generally  lack  the  apparent  job  relevance  of  hands-on 
performance  tests,  and  therefore  an  additional  requirement  is  imposed  to 
assure  that  the  tests  are  acceptable  to  examinees,  supervisors  and  com- 
manders as  valid  measures  of  job  proficiency. 

Job  relevance  of  the  tests  is  the  basic  requirement  for  both  training 
and  personnel  management,  even  though  the  definition  of  job  relevance 
may  have  somewhat  different  meanings  for  the  two  purposes.  For  training 
purposes  the  focus  is  on  the  subset  of  tasks  performed  in  the  specific 
job  assignment,  whereas  for  personnel  purposes  the  interest  is  on  the 
entire  set  of  tasks  in  the  specialty. 

Tile  SQTs  are  designed  to  serve  the  requirements  oC  training  and 
personnel  management.  Because  of  their  somewhat  divergent  immediate 
needs,  critical  issues  arise  in  how  SQTs  are  developed,  scored,  and 
used.  These  issues — notably  the  public  nature  of  test  content  and 
personnel  quotas  as  performance  standards — are  treated  in  this  paper. 

The  next  section  describes  the  development  of  Skill  Qualification 
Tests  and  expands  on  the  technical  requirements,  managerial  requirements, 
and  practical  constraints  described  in  this  section.  The  subsequent 
sections  describe  assumptions  in  scoring  SQT  and  benefits  resulting  from 
adopting  a criterion-referenced  approach  to  SQT  development.  Hie 
magnitude  of  these  benefits  far  outweighs  the  costs  of  developing  and 
implementing  such  a large-scale  program. 


DEVELOPMENT  OF  SKILL  QUALIFICATION  TESTS 

The  Skill  Qualification  Testing  (SQT)  program  is  a large  scale  at- 
tempt to  provide  valid  and  efficient  measures  of  job  proficiency.  This 
section  describes  the  process  of  developing  an  SQT,  which  assures  that 
the  tests  are  fair,  feasible  and  acceptable.  Because  of  the  strategic 
importance  of  Skill  Qualification  Testa  to  both  training  and  personnel 
management,  high  level  policy  decisions  were  made  about  test  content, 

validation,  and  scoring.  The  general  requirements  of  the  program  are  i 

that  tests  must  be  (a)  fair  and  feaslblt  and  (b)  have  validity  demon- 
strated in  advance  of  operational  use.  ' 

I 

FAIRNESS  AND  FEASIBILITY  OF  THE  TESTS 

Fairness  means  that  all  soldiers  have  an  equal  opportunity  to  demon- 
strate their  true  level  of  job  competence.  Test  content  must  be  based 

on  actual  job  requirements,  and  testing  conditions  must  be  sufficiently  I 

constant  throughout  the  Army  so  that  scores  obtained  from  administrations 
under  varied  conditions  are  not  noticeably  different.  Tests  given  in 
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Alaska,  Panama,  and  Korea  must  all  be  administered  under  similar  condi- 
tions, and,  in  addition,  all  persons  administer  inf;  and  scoring  the  tests 
must  be  able  to  do  so  accurately  and  objectively.  An  additional  require- 
ment is  that  the  tests  must  be  acceptable  to  soldiers  and  knowledgeable 
experts  as  fair  measures  of  ability  to  perform  critical  Job  tasks.  There- 
fore, fairness  attends  to  requirements  of  both  training  and  personnel 
management . 

Feasibility  requires  that  the  tests  be  suitable  for  administration  in 
all  types  of  units;  equipment,  terrain,  personnel,  and  all  testing 
material  must  be  readily  available.  Another  aspect  of  feasibility  is 
that  testing  time  must  be  reasonable,  with  up  to  one  day  allowed  for 
testing  each  soldier. 

The  requirements  that  Skill  Qualification  Tests  be  fair  and  feasible 
put  severe  limitations  on  the  use  of  hands-on  performance  tests.  Tl>e 
history  of  performance  testing  is  that  scoring  accuracy  and  standardi- 
zation are  difficult  to  obtain.  The  resolution  of  the  fairness  and 
feasibility  requirements  is  to  have  several  kinds  of  testing.  Under 
present  policy  decisions,  all  Skill  Qualification  Tests  contain  a written 
component,  and  some  Skill  Qualification  Tests  contain  a hands-on  component. 
Four  hours  of  testing  is  allowed  for  the  written  component,  and  up  to 
four  hours  is  allowed  for  the  hands-on  portion.  A third  component, 
called  performance  certification,  can  also  be  Included  in  Skill  Qualifi- 
cation Tests.  It  is  essentially  an  observational  evaluation  of  actual 
Job  performance. 

Therefore,  an  SQT  may  Include  up  to  three  distinct  types  of  tests, 
each  with  Its  own  inherent  strengths  and  weaknesses.  A combination  of 
these  tests  is  the  operational  answer  to  the  fairness  and  feasibility 
requirement . 

Types  of  Tests.  Hands-on  performance  tests  are  most  desirable. 

They  are  a form  of  structured  observation  where  a scorer  evaluates  an 
Individual  on  a set  of  performance  measures  (observable  behaviors). 
Advantages  of  hands-on  testing  are  obvious:  it  tests  actual  performance, 
has  high  fidelity  to  the  Job,  allows  for  immediate  feedback,  and  has 
high  face  validity  to  examinees.  However,  considerable  developmental 
effort  is  required  to  insure  scoring  reliability  and  standardization  of 
conditions.  It  also  is  expensive  in  terms  of  equipment,  personnel,  and 
time,  l.e.,  feasibility  is  often  a problem.  In  order  to  ensure  feasibil- 
ity there  is  a natural  tendency  to  truncate  tests  of  tasks  by  shrinking 
the  boundaries.  Unfortunately,  this  may  be  at  the  expense  of  the 
validity  of  the  test.  For  these  reasons  it  Is  extremely  difficult,  if 
not  impractical,  to  initiate  a large-scale  hands-on  testing  system  for  an 
organization  as  large  cs  the  Army.  Therefore,  a hands-on  component 
constitutes  a subset  of  an  SQT. 

An  alternative  form  of  hands-on  testing  Is  performance  certification. 
The  performance  certification  component  covers  tasks  that  are  too  long 
and/or  complex  to  Include  in  the  hands-on  component,  and  that  do  not 
lend  themselves  to  testing  in  « written  mode.  Performarce  certification 
tests  are  to  be  administered  and  scored  by  soldiers'  supervisors  in  the 
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normal  loh  setting.  Performance  certification  allows  greater  flexibility 
and  avoids  some  of  the  feasibility  problems  encountered  in  a hands-on 
citmponent.  Tlie  greatest  problems  in  performance  certification  are  in- 
suring reasonable  standardization  of  Job  testing  conditions  across 
Individuals  and  standardization  of  scoring  by  supervisors.  Ihitll  sound 
methods  are  developed  for  addressing  these  problems,  perfi>rmance  certi- 
fication will  remain  a small  portion  of  an  SQT. 

Tile  decision  to  Include  a written  component  Imposes  careful  consider- 
ation and  analysis  of  what  criterion-referenced  measurement  means  in  this 
context.  Since  the  focus  of  Skill  Qualification  Tests  is  on  ability  to 
perform  critical  Job  tasks,  that  aspect  must  be  retained.  l^ch  written 
test  of  a task  is  to  consist  of  a set  of  Items,  where  ivach  item  is  de- 
signed to  measure  an  essential  behavior  or  step  In  performing  the  task. 
For  tasks  that  require  primarily  mental  skills,  such  as  those  in  supply 
and  administration,  written  tests  of  tasks  are  often  similar  to  the 
behaviors  required  on  the  Job,  and  the  standards  for  ability  to  perform 
the  test  of  the  task  can  he  reasonably  close  to  those  on  the  Job.  For 
other  tasks  that  require  psychoniotor  skills,  written  test  items  only 
simulate  actual  Job  behaviors,  and  the  setting  of  realistic  standards 
indicating  ability  to  perform  the  task  is  a more  arbitrary  process.  To 
help  approximate  realistic  Job  conditions,  written  items  may  have  multi- 
ple correct  responses  and  a variable  number  of  alternatives.  'litis  addt'd 
flexibility  Increases  the  difficulty  in  developing  appropriate  methods 
for  setting  standards.  'llie  determination  of  reasonable  standards  for 
written  tests  of  tasks  Is  one  of  the  most  difficult  Issues  In  the  SQT 
program. 

Scoring  the  Tests.  Because  Army  Jobs  and  tralnlttg  programs  are 
structured  in  terms  of  critical  tasks,  the  appropriate  level  of  scoring 
for  ttu  SQT  should  also  be  based  on  tasks,  llie  concept  of  "scorable 
unit"  was  Invented  to  help  assure  criterion-referenced  measurement  of 
task  per foi mance.  A scorable  unit  is  designed  to  measure  ability  to 
perform  a specitlc  task,  or  In  the  case  of  complex  tasks,  a well  def lin'd 
subtask . 

Kach  written  scorable  unit  consists  of  a set  of  items,  where  each 
item  is  designed  to  measure  an  essential  behavior  or  step  in  performing 
the  task.  F^ch  item  la  scored  pass-fall,  and  a prescribed  number  of 
items  must  he  passed  to  be  GO  on  the  written  scorable  unit.  A GO  Is 
counted  as  ability  to  perform  the  task.  The  current  resolution  to  set- 
ting standards  for  written  scorable  units  is  to  require  that  an  a priori 
number  of  items  be  passed.  For  example,  if  a scorable  unit  contains 
five  Items,  then  four  must  be  passed  to  obtain  a GO. 

Hands-on  and  performance  certification  scorable  units  consist  of  a 
set  of  performance  measures,  where  each  performance  measure  is  si'oreil 
pass-fall,  and  a prescribed  number  of  performance  measures  must  be  (lassed 
to  be  GO  on  the  scorable  unit.  A GO  on  the  scorable  unit  is  interpreted 
as  ability  to  perform  the  task.  Tlie  standards  of  GO  gene  illy  are 
comparable  to  what  la  required  on  the  Job. 


The  requirement  that  all  scorable  units  be  acceptable  aa  fair  meas- 
ures of  ability  to  perform  tasks  Is  applied  to  both  the  hands-on  and 
written  tests.  Juries  of  experts  must  agree  that  the  written  items  and 
hands-on  performance  measures  reflect  ability  to  perform  the  tasks.  l’t*r- 
haps  a safer  statement  would  be  that  failure  to  pass  the  items  indicates 
that  the  person  is  not  able  to  perform  tl»e  task. 

Tlie  most  critical  requirement  of  SQTs  is  their  Job  relevance.  llie 
procedures  for  establishing  Job  relevance  are  described  in  tlie  following 
sect  ion. 


ESTABMSHINC  A CORRESPONDENCE  BETWEEN  TEST  CONTENT  AND  JOB  TASKS 

Test  content  of  all  SQTs  is  a sample  of  critical  tasks  from  the 
domain  of  Job  tasks  in  the  specialty.  In  this  way  the  tests  have  a 
specifiable  and  explicit  link  to  the  Job.  For  eacii  Army  Job  there 
exists  a Soldier's  Manual  that  lists  the  tasks  for  which  a soldier  In 
that  specialty  is  responsible.  llierefore,  this  set  of  tasks  becomes  the 
operAtlonal  definition  of  the  Job.  Tests  to  measure  performance  on 
specific  Job  tasks  listed  in  the  Soldier's  Manual  are  developed  from 
appropriate  task  analyses,  and  the  tests  for  each  task  are  operational 
definitions  of  performance  on  the  tasks.  Performance  on  the  individual 
tasks  is  summed  to  obtain  a total  score,  which  in  turn  serves  as  the 
operational  definition  of  Job  competence.  Modern  instructional  technol- 
ogy, with  its  emphasis  on  specification  of  objectives  and  verflcatlon 
that  those  objectives  are  attained,  supports  the  above  process  for 
establishing  the  content  and  focus  of  SQTs,  and  thereby  lends  added 
credibility  to  these  procedures. 

Tliough  the  task  is  the  basic  level  of  analysis,  the  validity  of  task 
proficiency  measurement  depends  on  the  adequacy  of  the  tost  of  the  task. 
By  means  of  detailed  task  analyses,  the  set  of  performance  measures  or 
behaviors  required  for  successful  performance  of  tlu*  task  are  Identified. 
These  lists  of  performance  measures  are  all  available  in  the  Soldier's 
Manual.  Each  item  developed  to  test  for  task  proficiency  must  occupy  a 
clearly  specified  relationship  to  a performance  measure  required  in 
task  performance.  Assuming  that  the  set  of  items  developed  for  a test 
of  a task  has  been  selected  in  accordance  with  the  procedures  described 
above,  one  may  assume  with  reasonably  high  confidence  that  successful 
performance  of  each  tested  behavior  is  a necessary  condition  for  success- 
ful performance  of  the  task.  How  to  score  the  set  o^  items  in  a written 
scorable  unit  to  obtain  estimates  of  ability  to  perform  tasks  is  a 
complex  question.  Measurement  error  is  always  a problem  that  must  be 
allowed  for.  Whether  being  scored  GO  on  a test  of  a task  requires 
passing  all  items  Included  in  the  test  or  something  leas  than  perfection 
depends  on  the  nature  of  the  task,  the  fidelity  with  which  the  task  can 
be  tested  in  a written  mode,  the  complexity  of  the  format  (e.g.,  multi- 
ple correct  responses),  and  the  number  of  items  within  the  cluster.  Use 
of  subject  matter  experts  in  reaching  such  a determination  is  mandatory. 
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In  the  case  of  a hands-on  test  of  a task,  measurement  error  arising 
from  the  use  of  words  Is  minimized.  However,  other  measurement  problems 
arise.  One  Is  that  a full  performance  test  of  a task  generally  Is  not 
feasible.  It  may  be  too  costly  In  terms  of  time,  equipment,  or  person- 
nel. Therefore,  a truncated  test  of  the  task  Is  often  developed  by 
eliminating  some  of  the  performance  measures  or  steps  required  for  the 
full  performance  test.  By  truncating  the  test,  though.  It  Is  possible 
that  the  tested  portion  Is  necessary  to  successful  task  performance,  but 
Is  not  sufficient. 

VALIDATING  TESTS  PRIOR  TO  ADMINISTRATION 

A first  question  to  be  resolved  was  how  to  define  validity.  Tlie 
starting  point  was  the  usual  definition  of  validity,  l.e.,  that  the  tests 
measure  what  they  are  Intended  to  measure.  In  the  case  of  Skill  Qualifi- 
cation Tests,  the  intent  Is  to  measure  ability  to  perform  critical  Job 
tasks.  The  content  of  the  tests,  therefore,  becomes  the  crucial  factor 
In  establishing  validity.  The  content  must  be  thoroughly  reviewed  by  ex- 
perts to  ensure  that  the  right  behaviors  and  decisions  are  assembled  In 
each  scorable  unit.  The  first  requirement,  then.  Is  consistent  agreement 
among  experts  that  the  content  of  the  test  is  based  on  ability  to  perform 
critical  Job  tasks.  A second  requirement  Is  that  the  scorable  units  dis- 
criminate between  performers  (masters)  and  nonperformers  (non-masters). 

A third  requirement  applies  only  to  written  scorable  units.  All  Items 
in  a written  scorable  unit  must  be  consistent  estimators  of  mastery  on 
the  task  covered  by  the  entire  scorable  unit.  Thus,  the  conceptualizing 
of  validity  focuses  on  consistency:  consistency  between  the  content  of 
the  test  and  the  Job  tasks,  consistency  among  expert  reviews,  and  consis- 
tency In  Identifying  mastery. 

Skill  Qualification  Teats  are  constructed  and  validated  by  Army 
agencies  that  have  resident  expertise  in  the  Job  specialties.  Generally 
these  are  the  Army  schools,  but  they  also  Include  other  agencies,  such 
as  the  Health  Services  Command.  Since  the  test  content  must  reflect  Job 
tasks,  the  test  developers  must  have  detailed  task  analyses  available  that 
Identify  the  behaviors  essential  to  successful  performance  of  the  tasks. 
Skill  Qualification  Tests  are  developed  In  the  following  conceptual 
sequence : 

1.  Identify  tasks  for  testing; 

2.  Identify  behaviors  or  steps  e^entlal  for  performing  each  task; 

3.  Develop  scorable  units  to  cover  essential  behaviors  of  the  task, 
and  review  scorable  units  for  content  validity; 

4.  Try  out  scorable  units  on  soldiers  to  verify  accuracy  of 
measurement. 

After  each  step  in  the  process,  the  products  are  submitted  to  higher 
headquarters  for  review  and  approval.  The  content  of  the  scorable  units 
is  fixed  after  step  3.  Scorable  units  found  to  be  unsatisfactory  through 
tryout  on  soldiers  can  be  revised,  but  the  content  cannot  be  changed. 

Teat  content  Is  fixed  through  agreement  among  experts  that  the  contents 
of  the  scorable  units  are  Indeed  valid  measures  of  ability  to  perform  the 
tasks,  and  the  tryout  serves  only  to  establish  the  measurement  properties 
of  the  scorable  units. 


The  tryout  with  soldiers  is  different  for  the  hands-on  and  written 
components.  For  the  hands-on  tests,  the  primary  concern  is  to  establish 
that  the  performance  measures  can  be  scored  accurately.  Acceptable 
agreement  among  the  scores  is  considered  to  be  attained  when  80  percent 
of  all  pairs  of  rater  scores  are  the  same  for  the  performance  measures 
in  a scorable  unit.  If  less  than  80  percent  agreement  is  obtained,  then 
the  performance  measures  are  revised  until  an  adequate  level  of  scoring 
consistency  is  attained. 

For  written  tests  the  tryout  is  concerned  with  establishing  the  ef- 
fectiveness of  scorable  units  in  distinguishing  between  performers  and 
nonperformers,  and  with  assuring  that  all  elements  in  a scorable  unit 
are  consistent  in  estimating  ability  to  perform  the  task.  This  tryout 
helps  assure  that  all  items  of  a scorable  unit  contribute  to  measuring 
performance  on  the  task. 

A final  evaluation  of  the  written  scorable  units  is  conducted  after 
operational  administration  of  the  tests.  A representative  sample  of 
answer  sheets  is  selected  for  analysis,  and  the  difficulty  of  items  and 
scorable  units  are  obtained.  Those  with  high  difficulty  are  examined  to 
determine  if  they  are  faulty.  Faulty  items  and  scorable  units  are 
deleted  prior  to  final  scoring.  When  all  steps  of  the  review  and  analy- 
sis procedure  for  the  written  scorable  units  are  accomplished,  their 
validity  as  fair  measures  of  ability  to  perform  job  tasks  is  considered 
to  be  reasonably  well  established. 


ASSUMPTIONS  FOR  USING  SQT  SCORES 

The  assumptions  on  which  SQTs  are  scored  can  be  clearly  explicated, 
as  can  the  operations  that  determine  test  content,  scoring  procedures 
and  standards. 

In  this  section,  three  sets  of  assumptions  made  in  using  SQT  scores 
are  considered.  These  are  using  SQTs  to  (a)  help  determine  training 
requirements,  (b)  help  select  soldiers  in  a single  specialty,  and  (c) 
help  select  soldiers  in  merged  specialties. 


HELP  DETERMINE  TRAINING  REQUIREMENTS 

The  assumptions  required  for  using  SQTs  to  help  determine  training 
requirements  are  straightforward.  They  are  simply:  (a)  tasks  can  be 
defined — task  elements  or  behaviors  can  be  specified,  conditions  given, 
and  standards  of  adequate  performance  established;  (b)  tasks  can  be 
measured  validly — performance  on  the  task  is  measured  by  scorable  units, 
which  contain  time  or  performance  measures  related  to  task  elements,  and 
the  sum  of  the  elements  passed  in  a scorable  unit  indicates  quality  of 
performance  on  the  task;  (c)  task  elements  are  weighted  equally — items 
or  performance  measures  corresponding  to  task  elements  or  behaviors  are 
scored  as  pass-fail,  or  as  one-zero. 
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These  three  assumptions  serve  to  provide  operational  definitions  of 
performance  on  the  tasks  measured  In  SQTs.  Although  task  elements  do 
not  have  to  be  weighted  equally,  research  evidence  Indicates  that  differ- 
ential weighting  generally  does  not  Improve  the  quality  of  measurement. 

A common  practice  Is  to  give  an  element  greater  weight  by  preparing 
several  Items  or  performance  measures  for  it. 

The  assumptions  needed  to  help  determine  training  requirements  per- 
tain only  to  tasks  taken  one  at  a time.  Since  the  current  training 
philosophy  is  to  train  on  discrete  tasks,  no  assumption  about  the 
Interrelationships  among  the  tasks  Is  required. 


HKLP  SELECT  SOLDIERS  IN  A SINGLE  SPECIALTY 

The  case  of  using  SQTs  to  help  select  soldiers  In  a single  Job 
specialty  requires  additional  assumptions  about  the  interrelat lonsliips 
among  job  tasks  and  scorable  units  that  measure  task  performance.  The 
same  three  assumptions  about  measuring  task  performance  are  required 
(tasks  can  be  defined,  tasks  can  be  measured  validly,  and  task  elements 
are  weighted  equally). 

In  addition,  three  more  assumptions  are  reeulred;  (a)  scorable  units 
are  weighted  equally — all  are  scored  as  GO /NO-GO  or  as  one-zero;  (b)  test 
score  Is  the  number  of  scorable  units  performed  correctly — the  total 
score  Is  obtained  by  adding  up  the  number  of  scorable  units  passed,  and 
(c)  the  percent  of  scorable  units  passed  indicates  level  of  job  perform- 
ance— the  percent  of  scorable  units  passed  corresponds  to  the  proportion 
of  job  tasks  a soldier  can  perform.  Given  these  assumptions,  SQTs  define 
the  criterion  of  job  proficiency,  and  the  percent  of  scorable  units  cor- 
rect (called  percent-correct)  Is  a direct  reflection  of  job  proficiency. 
Standards  of  job  proficiency  can  then  be  set  in  terms  of  percent-correct 
scores. 


HELP  SELECT  SOLDIERS  IN  MERGED  SPECIALTIES 

In  the  case  of  merged  specialties,  an  additional  assumption  Is 
required  about  the  relationships  among  the  jobs  or  groups  of  soldiers. 

The  first  six  assumptions  made  in  the  case  of  the  single  specialty  result 
In  criterion-referenced  measurement  for  each  of  the  Jobs  being  merged. 
However,  in  order  to  maintain  criterion-referenced  standards  for  merged 
specialties,  the  assumption  is  required  that  the  jobs  being  merged  are 
equal — that  Is,  equal  levels  of  proficiency  in  the  individual  jobs  are 
equal  to  each  other  In  an  absolute  sense,  or  stated  operationally,  all 
scorable  units  from  all  the  relevant  SQTs  are  weighted  equally.  Thus  a 
soldier  qualified  In  specialty  45N,  for  example.  Is  equal  to  the  quali- 
fied soldier  in  45P,  regardless  of  the  percentage  of  soldiers  in  each 
qualified  group.  An  implication  of  this  assumption  that  the  jobs  being 
merged  are  equal  Is  that  if  one  qualified  group  contained  S percent  of  a 
first  MOS  population  while  a second  qualified  group  contained  50  percent 
of  a second  MOS  population,  the  merged  qualified  group  would  contain 
proportionally  more  soldiers  from  the  second  group. 


In  the  above  example,  each  MOS  would  be  represented  In  the  merged 
qualified  group  In  accordance  with  the  number  of  soldiers  from  each  MOS 
who  attained  qualifying  scores.  One  MOS  may  be  proportionally  over- 
represented, while  the  second  MOS  Is  minimally  represented  or  possibly 
not  represented  at  all.  How  to  use  and  maintain  performance  standards 
for  merging  MOS  is  a policy  decision,  and  not  a technical  question. 
Howevef,  the  criterion-referenced  properties  of  SQTs  permit  rational 
policy  decisions. 

An  alternative  assumption  in  the  case  of  merged  specialties  Is  that 
the  groups,  and  not  the  MOS,  are  equal — that  Is,  equal  percentile-rank 
scores  Indicate  equal  levels  of  Job  proficiency.  Tlje  use  of  percentlle- 
rank  scores,  which  Indicate  relative  standing  in  a group,  facilitates 
proportional  representation  of  each  MOS  In  the  merged  qualified  group. 

For  exmple,  a policy  decision  could  be  made  that  40  percent  of  each  MOS 
be  considered  eligible  for  promotion.  Such  a policy  decision  might  be 
made  if  policy  makers  were  not  willing  to  assume  that  the  Jobs  were 
equal,  or  that  the  SQTs  were  not  equally  valid  criterion-referenced 
measures  of  all  the  merged  MOS,  or  If  the  policy  makers  decided  that  the 
need  for  proportional  representation  of  the  MOS  In  the  qualified  group 
outweighed  the  need  to  maintain  performance  standards.  However,  if  SQTs 
are  scored  as  percentlle-rank  and  qualifications  are  based  on  percentile- 
rank  scores,  then  the  Job  performance  standards  would  be  given  little  or 
no  consideration  In  determining  the  qualified  group. 


BENEFITS  FROM  USING  CRITERION-REFERENCED  SQTs 

The  change  In  focus  from  norm-referenced  Military  Occupational  Spe- 
cialty proficiency  tests  to  criterion-referenced  Skill  Qualification  Test 
has  enabled  training  and  personnel  management  to  obtain  more  comprehen- 
sive and  meaningful  information  than  before.  TVo  major  benefits  that 
have  resulted  from  the  adoption  of  the  criterion-referenced  approach 
deal  with  (a)  public  nature  of  test  content,  and  (b)  Job  performance 
standards  vs.  personnel  quotas.  Tliese  btnieflts  are  discussed  separately 
In  the  following  paragraphs. 


PUBLIC  NATURE  OF  TEST  CONTENT 

An  effective  Job  proficiency  testing  program  should  be  part  of  a 
larger  system^  that  Includes  Job  requirements  and  individual  training 
programs.  Modern  instructional  technology  emphasizes  the  systems  ap- 
proach to  training,  and  a Job  proficiency  testing  program  is  an  Integral 
component  of  the  Army's  modern  training  system. 

Job  requirements  are  defined  by  Soldier's  Manuals,  which  list  all  the 
tasks  a soldier  In  an  MOS  skill  level  (job)  Is  responsible  for  performing 
Soldier's  Manuals  are  distributed  throughout  the  Army  for  use  by  Individ- 
ual soldiers  and  for  developing  training  programs,  both  resident  courses 
and  decentralized  training  conducted  In  units.  Soldier's  Manuals  are 
also  used  to  develop  SQTs.  No  task  can  be  tested  that  Is  not  In  the  Sol- 
dier's Manual.  Once  the  system  becomes  fully  operational,  all  components 
of  the  Army  can  know  what  each  soldier  should  be  able  to  do,  is  able  to 
do,  and  should  be  trained  to  do.  There  will  be  no  surprise  requirements. 


In  addition  to  Soldier's  Manuals,  soldiers  are  given  additional 
detailed  information  about  the  Job  tasks  on  which  they  will  be  tested. 

This  information  is  contained  In  the  SQT  Notice,  which  lists  the  specific 
tasks  Included  in  an  SQT,  how  the  tasks  will  be  tested  (written  or  hands- 
on),  standards,  and  a description  of  the  actual  test  content.  Soldiers 
are  given  advance  notice  of  what  they  will  be  required  to  know  and  do. 

All  soldiers  in  an  MOS  are  given  equal  information  about  whet  they  will 
be  tested  on,  potentially  allowing  them  equal  opportunity  to  prepare  for 
the  test.  Test  content,  at  least  in  general  terms,  is  public  knowledge. 

The  public  nature  of  test  content  reduces  the  need  for  representa- 
tive sampling  of  tasks.  One  reason  representative  sampling  of  tasks  is 
Important  in  the  typical  testing  program  Is  to  give  all  examinees  an 
equal  opportunity  to  demonstrate  their  competence.  With  the  SQT  Notice, 
test  content  can  be  focused  in  special  areas,  such  as  areas  that  have 
high  training  needs  or  that  are  related  to  new  equipment  in  the  field. 

The  public  nature  of  SQT  content  also  helps  establish  an  Integrated 
training  and  testing  program  based  on  critical  job  requirements.  By 
selecting  test  content  that  focuses  on  critical  job  requirements,  training 
efforts  will  tend  to  be  directed  toward  these  same  requirements.  Thus, 
an  Integrated  training  and  testing  system  is  being  developed  based  on 
Job  requirements. 

As  long  as  individuals  are  tested  on  the  specific  requirements  of 
their  jobs,  there  is  no  advantage  to  keeping  the  test  content  secret. 

In  fact,  if  the  test  is  directly  related  to  performance  on  the  job,  then 
the  proficient  individual  should  already  know  the  test  content  without 
the  benefit  of  the  information  contained  in  a test  notice. 

Minimizing  Effects  of  Job  Assignments  on  Test  Scores.  A problem 
that  arises  in  the  typical  testing  program,  where  test  content  is  kept 
secret,  is  that  some  individuals  have  special  advantages  over  others. 

One  possible  advantage  is  that  because  of  favorable  job  assignments,  job 
tasks  and  test  content  are  very  closely  related  for  some  individuals. 

In  the  past  soldiers  who  were  working  outside  of  their  MOS  were  at  a 
distinct  disadvantage  on  the  test  content  based  on  MOS-speciflc  job 
tasks.  The  effects  of  bad  assignments  are  minimized  in  the  SQT  program 
because  all  MOS  soldiers  are  told  specifically  what  content  will  be 
Included  in  the  test.  The  prior  knowledge  about  test  content  tends  to 
equalize  opportunities. 

In  the  past  some  soldiers  have  had  advantages  because  they  were  more 
familiar  with  the  voluminous  references  given  for  MOS  tests.  Some 
soldiers  did  not  have  the  references  available  to  them,  and  some  even  if 
they  did,  had  difficulty  In  identifying  the  critical  information  within 
the  mass  of  paper  and  words.  In  the  Soldier's  Manual  and  SQT  Notices 
the  critical  information  is  distilled  and  made  available  to  all  MOS 
soldiers.  Thus,  soldiers  with  high  verbal  fluency  or  with  access  to 
specialized  Information  no  longer  retain  such  a distinct  advantage. 

Since  the  critical  information  is  made  available  to  all  soldiers  in  a 
form  readily  understood,  the  opportunities  to  acquire  competence  are 
equally  available  to  all  soldiers. 


Minimizing  Fears  About  Taking  Tests.  Some  Individuals  seem  to  have  a 
knack  for  doing  well  on  tests,  while  others  seem  to  freeze  when  confronted 
with  a testing  situation.  Test  wiseness  Is  frequently  cited  as  an  expla- 
nation of  why  some  do  better  than  expected,  and  test  anxiety  Is  ascribed 
as  a reason  why  some  do  more  poorly  than  expected.  Both  of  these  factors 
— test  wiseness  and  test  anxiety — are  undesirable  Influences  because  they 
distort  the  meaning  of  test  scores.  In  the  SQT  program  where  everyone 
has  an  opportunity  to  practice  for  the  test,  the  effects  of  test  wiseness 
and  test  anxiety  are  minimized,  and  the  scores  are  more  likely  to  reflect 
true  levels  of  competence. 


A factor  related  to  test  wiseness  and  test  anxiety  Is  the  threat  that 
many  soldiers  experience  when  taking  tests.  The  threat  may  be  viewed  as 
having  both  objective  and  subjective  components.  A major  source  of 
objective  threat  arises  from  the  fact  that  SQTs  are  used  to  help  make 
personnel  decisions  that  affect  careers.  Soldiers  who  do  poorly  on  SQTs 
are  likely  to  be  penalized,  while  those  who  do  well  are  rewarded.  The 
test  then,  understandably,  poses  a threat  to  many  soldiers,  especially 
those  who  are  marginal  performers  or  who  are  not  familiar  with  testing, 
or  who  have  had  negatively  conditioning  experiences  In  school  situations. 
Subjective  components  of  threat  may  arise  from  a variety  of  circumstances, 
such  as  personal  characteristics,  prior  experience  with  tests,  or  from  a 
fear  of  being  evaluated.  The  fear  of  being  evaluated  may  arise  because 
the  rules  or  basis  for  the  evaluation  are  not  explicit.  If  soldiers  have 
foreknowledge  about  the  tasks  they  will  be  evaluated  on,  and  the  means  by 
which  the  evaluation  will  be  conducted,  then  the  subjective  threat  may 
often  be  reduced.  Prior  knowledge  about  test  content  may  equalize  oppor- 
tunities for  soldiers  to  demonstrate  their  true  level  of  job  competence 
by  reducing  distortion  of  test  scores  arising  from  subjective  threat. 


The  public  nature  also  has  the  general  effect  of  Increasing  the 
validity  of  the  tests.  By  giving  all  MOS  soldiers  more  of  an  equal 
opportunity  to  prepare  for  the  test,  the  test  scores  are  more  likely  to 
reflect  true  levels  of  competence. 


JOB  PERFORMANCE  STANDARDS  VS.  PERSONNEL  QUOTAS 


A criterion-referenced  job  proficiency  test  consisting  of  task-based 
tests  can  be  scored  In  terms  of  percent  of  tests  correct,  which  is  a 
direct  Indicator  of  the  percentage  of  job  tasks  a soldier  can  perform, 
and  therefore.  Is  a direct  measure  of  level  of  job  competence.  The 
percent  of  task-based  tests  correct  can  be  Interpreted  because  standards 
are  specified.  The  distribution  of  scores  Is  not  a relevant  considera- 
tion In  interpreting  the  meaning  of  the  scores.’ 


Normfsfarencad  proficiancv  teits,  in  which  iMms  have  no  meaning  in  terms  of  job-related  activities,  have  meaning  only  in 
terms  of  percentile-rank  scores.  The  percentage  of  items  correct  does  not  convey  information  because  the  population  of 
items  has  not  been  defined  precisely.  Since  such  test  scores  have  no  external  referent,  the  scores  can  be  interpreted  only 
in  relation  to  the  group  taking  that  particular  set  of  items.  The  tendency,  based  on  traditionai  psychorttetric  theory,  is 
to  select  items  on  the  basis  of  their  difficulty  and  correlation  with  total  test  score.  If  items  do  not  have  the  desired 
statistical  properties,  they  are  deleted  or  revised  until  they  exhibit  the  proper  difficulties  and  correlations  with  total  score. 
Rasulting  changes  in  test  content,  end  therefore,  the  correspondence  between  test  and  job  content,  are  not  systematically 
taken  into  account. 
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For  each  task  In  an  SQT  two  categories  of  performance  are  estab- 
lished— qualified  and  not  qualified*  Therefore,  SQTs  provide  GO/NO-GO 
decisions  on  task  performance*  Soldiers  either  meet  these  standards  or 
they  do  not*  The  total  SQT  score  Is  the  sum  of  all  scorable  units 
passed,  which  provides  continuous  scores  ranging  from  all  scorable  units 
correct  to  none,  or  100  percent  correct  to  0 percent  correct* 

Current  Army  policy  Is  that  the  SQT  total  score  scale  Is  divided  Into 
three  categories*  The  higher  passing  score,  called  the  Qualification 
Score,  determines  eligibility  for  award  of  the  next  higher  skill  level, 
and  therefore  eligibility  for  promotion*  Only  persons  with  the  appro- 
priate skill  level  are  eligible  for  promotion.  The  Qualification  Score 
is  set  at  80  percent  of  the  scorable  units  correct*  The  lower  passing 
score,  called  the  Verification  Score,  determines  eligibility  to  retain 
the  current  skill  level;  the  Verification  Score  is  set  at,  60  percent  of 
the  scorable  units  correct*  Soldiers  with  SQT  scores  below  60  percent 
correct  may  be  reclassified  to  another  MOS* 

Rank  Ordering  and  Performance  Categories*  If  SQT  scores  are  also 
used  to  rank  order  soldiers,  then  in  most  cases  the  criterion-referenced 
power  of  the  tests  will  be  reduced  or  lost  entirely*  The  following  cases 
Illustrate  this  point;  the  number  of  ellgibles  is  a)  equal  to,  b)  less 
than,  and  c)  greater  than  the  quotas* 

a)  If  the  quotas  and  number  of  eligible  soldiers  are  the  same,  then 
the  decisions  of  whether  to  promote,  based  on  the  hurdle,  and  when  to 
promote,  based  on  rank  order,  have  the  same  boundaries  and  there  Is  no 
conflict  between  quotas  and  standards* 

b)  If  the  number  of  ellgibles  is  less  than  the  quota  and  the  stand- 
ards are  waived  until  the  quotas  are  met,  then  the  rank  ordering  would  be 
used  to  decide  both  whether  and  when  to  promote*  Waiving  standards  could 
be  equivalent  to  rank  ordering*  If  the  standards  are  waived  one  unit  at 
a time  until  the  quotas  are  satisfied,  then  the  effect  Is  to  rank  order 
with  no  regard  to  prerequisites*  The  waiving  could  be  done  in  larger 
units,  say  from  80  correct  to  60  correct,  and  then  making  the  decision  of 
when  to  promote  on  the  basis  of  other  factors.  How  the  waiving  is  accom- 
plished and  how  the  tradeoff  between  standards  and  quotas  is  achieved, 
are  policy  decisions*  Waiving  standards  forces  an  explicit  decision 
about  the  tradeoff,  whereas  the  pure  rank  ordering  approach  Ignores  any 
consideration  of  standards*  On  the  other  hand,  if  standards  are  not 
waived,  then  the  rank  ordering  would  be  used  only  to  decide  when  to 
promote*  In  this  case  the  quotas  would  be  waived  In  favor  of  Increased 
quality* 

c)  If  the  number  of  ellgibles  Is  greater  than  the  quota,  then 
depending  on  how  the  pool  of  ellgibles  becomes  replenished,  the  prerequi- 
site standards  may  have  varied  meaning*  If  the  pool  of  ellgibles  is 
always  larger  Chan  Che  quoCa,  Chen  some  soldiers  near  the  cutting  score 
may  not  be  reached  and  consequently  not  promoted*  If  the  pool  is 
exhausted  before  new  soldiers  are  added,  then  these  soldiers  are  assured 
eventual  promotion,  and  new  soldiers  who  become  eligible  are  placed  Into 


a hold  category  until  the  original  pool  is  exhauated.  If  the  new  eli- 
gible soldiers  are  immediately  added  to  the  pool,  then  there  is  no 
assurance  that  the  remaining  eligible  soldiers  from  the  original  pool 
will  be  promoted  even  though  they  surpassed  the  prerequisite  standards. 

The  main  point  about  hurdles  vs.  rank  ordering  is  that  the  criterion- 
referenced  standards  may  be  lost  to  the  rank  order  unless  explicit 
decisions  are  made  to  retain  the  standards.  Rank  ordering  lends  itself 
so  easily  to  satisfying  quotas  that  performance  standards  may  be  readily 
bypassed.  The  ability  to  obtain  objective  standards  of  job  performance 
has  profound  impact  on  how  personnel  decisions  can  be  made.  Personnel 
managers  have  a choice  between  using  a priori  derived  standards, 
independent  of  the  population  taking  the  test,  and  using  quotas  derived 
independent  of  the  content  of  the  test.  The  traditional  solution  to 
personnel  decisions  Is  to  establish  quotas,  and  then  to  select  individ- 
uals until  the  quotas  are  satisfied. 

According  to  the  criterion-referenced  test  model,  levels  of  perform- 
ance within  a proficiency  category  are  not  discriminated  because  the 
criterion  levels  are  the  only  points  of  interest.  Continuous  scores  are 
available,  however,  and  they  can  be  used  for  ratik  ordering  soldiers. 
Because  SQTs  can  be  scored  either  in  terms  of  performance  categories  or 
as  continuous  scores,  explicit  decisions  can  be  made  about  which 
methods  or  combination  of  methods  to  use,  and  how  the  scores  will  be 
used  in  personnel  decisions. 

As  a minimum,  SQTs  are  used  to  set  prerequisites  for  promotion.  As 
described  above,  the  prerequisite  score  is  waived  to  meet  quotas  if  such 
a policy  decision  is  made.  An  Immediate  question  is  whether  SQT  scores 
should  be  used  to  rank  order  the  pool  of  soldiers  eligible  for  promotion. 
To  oversimplify  the  question:  SQTs  are  now  used  to  determine  whether  to 
promote.  The  question  of  when  to  promote  can  also  be  answered  on  the 
basis  of  SQT  scores,  or  can  be  based  on  other  factors.  (Other  factors 
besides  SQT  scores  do  affect  promotablllty , but  the  oversimplified 
version  puts  the  issue  in  stark  relief.)  A discussion  of  how  SQT  scores 
can  be  combined  with  other  factors  is  presented  later  in  this  section. 

An  unfortunate  consequence  of  using  quotas  is  that  performance  stan- 
dards, which  may  be  used  in  delineating  a quota  limit  for  one  particular 
point  in  time,  may  not  be  entirely  relevant  when  applied  in  another 
situation.  If,  for  example,  the  top  50  percent  in  a Job  is  eligible  for 
promotion,  the  Job  performance  of  the  eligible  group  will  vary  as  the 
soldiers  change  over  the  years,  or  as  the  effectiveness  of  the  training 
programs  change,  or  as  the  relationship  between  test  content  and  Job 
requirements  change  over  time. 

Quality  vs.  Quantity  in  Personnel  Decisions.  A major  breakthrough 
resulting  from  criterion-referenced  SQTs  is  the  availability  of  objective 
information  about  Job  competence  that  can  be  included  in  making  personnel 
decisions.  Level  of  Job  performance  measured  by  these  tests  provides  an 
absolute  indication  of  proficiency  that  remains  relatively  constant  as 
long  as  Jobs  remain  defined  by  existing  Soldier's  Manuals.  Performance 


standards  for  personnel  decisions  can  be  specified  in  terms  of  the 
percentage  of  job  tasks  soldiers  can  perform.  These  standards  are  exter- 
nal to  the  test,  and  therefore  more  powerful  statements  can  be  made 
about  the  groups  that  are  eligible  to  be  selected  in  or  out. 

(>jotas  for  personnel  actions,  such  as  promotion  or  attendance  at  a 
school,  are  likely  to  remain  a driving  force  for  personnel  management  in 
the  foreseeable  future.  Rarely,  if  ever,  will  the  number  of  soldiers 
eligible  for  a personnel  action,  based  on  performance  standards,  be  the 
same  as  the  required  quota.  Some  adjustment  to  the  quotas  or  performance 
standards,  or  both,  generally  will  be  required.  If  quotas  are  given  top 
priority,  then  standards  are  waived;  conversely.  If  performance  is  given 
top  priority,  then  quotas  are  waived.  If  boj;h  quotas  and  performance 
are  waived,  say  within  some  pre-established  bounds,  then  a tradeoff 
between  quality  and  quantity  can  be  established. 

Decision  rules  about  quality  vs.  quantity  can  be  explicitly  stated. 

If  performance  standards  are  waived,  there  is  a cost  in  terms  of  lowered 
individual  performance  (quality)  in  order  to  obtain  sufficient  numbers 
(quantity).  If  quotas  are  waived,  there  is  a gain  in  individual  perform- 
ance (quality),  but  Insufficient  numbers  (quantity)  are  obtained.  By 
assigning  values  to  units  of  performance  and  shortfalls,  the  tradeoff 
between  quantity  and  quality  can  be  calculated.  Again,  the  tests  do  not 
dictate  policy  about  quantity  or  quality,  but  they  support  decision  rules 
and  permit  operations  not  possible  without  them. 

Weighting  Factors  in  Personnel  Decisions.  The  situation  becomes  more 
complex  when  one  does  not  base  personnel  decisions  exclusively  on  test 
scores,  but  rather  uses  test  scores  as  one  factor  in  a composite  score. 
Army  personnel  actions  generally  have  been  based  on  a composite  score, 
which  is  characterized  as  the  whole-man  concept.  The  composites  may  be 
governed  by  explicit  rules  to  provide  objective  indices,  or  the  variables 
may  be  combined  in  a subjective  manner  by  the  decisionmakers.  An  example 
of  explicit  rules  governing  the  combination  of  factors  is  Enlisted 
Evaluation  Scores  based  on  a weighting  of  MOS  test  scores  and  Enlisted 
Evaluation  Report  scores;  another  example  is  the  determination  of 
whether  a soldier  meets  the  prerequisites  for  a particular  job  training 
course,  in  which  aptitude  area  scores,  physical  profile,  and  perhaps 
prior  training  may  be  considered.  An  example  of  subjective  combination 
of  factors  is  the  process  followed  by  a typical  selection  board  that 
interviews  soldiers,  examines  their  records,  and  then  arrives  at  a 
collective  decision. 

Criterion-referenced  standards  require  the  use  of  explicit  rules  for 
setting  the  minimum  levels  of  qualification.  If  the  process  of  combining 
scores  for  the  qualified  group  is  objective,  exp-lclt  weights  are 
assigned  to  each  variable,  and  the  contribution  of  each  variable  to  the 
component  score  can  be  specified. 


The  assigned  weights  and  the  actual  weights  may  or  may  not  be  the 
same.  The  actual  weight  of  a factor  Is  determined  largely  by  the 
variability  or  range  of  scores  for  that  factor.  If  the  range  is  small, 
the  effect  Is  to  add  a virtual  constant  value  to  each  individual's 
score,  regardless  of  assigned  weight,  and  the  small  differences  can  have 
only  a small  effect  on  the  final  rank  ordering  of  the  soldiers.  If  the 
combining  is  based  on  subjective  Judgment,  then  the  weighting  of  the 
variables  cannot  be  explicated.  In  either  case,  an  Important  considera- 
tion is  how  the  minimum  qualifications  are  treated  in  determining  eligi- 
bility for  a personnel  action.  If  the  standards  do  serve  to  categorize 
soldiers  Into  qualified  and  non-qualif led  groups  and  the  qualified  group 
is  then  given  the  favorable  treatment  while  the  non-qualif led  group  is 
excluded  from  consideration,  then  the  criter lon-feferenced  standards  are 
operative.  If,  however,  the  minimum  standards  can  be  waived,  then  the 
subjective  process  may  easily  ignore  the  standards,  and  the  net  effect 
may  be  to  lose  the  power  that  inheres  in  criterion-referenced  standards. 

The  process  of  combining  scores  may  also  be  based  on  successive 
hurdles.  The  use  of  successive  hurdles  for  combining  scores  virtually 
assures  that  standards  will  be  maintained.  Establishment  of  the  minimum 
levels  of  qualifications  requires  explicit  decisions,  and  any  waiving 
then  must  also  be  explicit.  An  example  of  multiple  hurdles  is  the 
determination  of  eligibility  for  entrance  in  a job  training  course.  A 
minimum  aptitude  area  score  is  set,  usually  at  90,  and  other  minimum 
prerequisites  may  also  be  Included  in  the  decision,  such  as  physical 
profiles,  prior  military  job  training,  and  high  school  courses  completed. 
Not  all  eligible  persons  enter  a course,  but  unqualified  persons  are 
excluded  unless  a specific  waiver  is  applied.  The  use  of  hurdles  is 
compatible  with  criterion-referenced  standards. 

SQTs,  because  of  their  criterion-referenced  properties,  permit  basing 
personnel  decisions  on  objective  performance  standards.  As  has  been  men- 
tioned, technical  feasibility  does  not  necessarily  dictate  policy,  and 
therefore  personnel  decisions  need  not  necessarily  be  based  on  perform- 
ance standards.  However,  since  the  possibility  exists,  rational  evalu- 
ation of  the  costs  and  benefits  in  changing  to  new  personnel  policies  can 
now  be  accomplished  by  decisionmakers. 


CONCLUSIONS 

Two  themes  have  pervaded  the  discussion  of  criterion-referenced  Skill 
Qualification  Tests:  1)  test  content  is  based  on  systematic  analysis  of 
job  requirements;  2)  SQTs  provide  new  opportunities  for  training  managers, 
personnel  managers,  and  research  personnel  to  reassess  and  redefine  their 
functions. 

SQTs  provide  new  Information  about  levels  of  job  performance  not 
previously  available  from  traditional  proficiency  and  achievement  tests. 
However,  the  power  Inherent  in  this  Information  would  be  lost  unless 
explicit  use  is  made  of  the  criterion-referenced  performance  data 
available  from  SQT  scores. 
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For  training  managers  and  job  supervisors,  feedback  from  SQTs  can  be 
used  to  structure  individualized  training  programs  based  on  critical  job 
tasks.  Instead  of  basing  training  requirements  on  global  evaluations  of 
performance , training  programs  can  be  based  on  specific  job  tasks  that 
are  critical  to  both  unit  mission  and  individual  job  requirements. 

Personnel  managers  have  responsibilities  for  defining  job  specialties 
and  for  matching  individuals  and  jobs.  Under  traditional  procedures, 
jobs  have  tended  to  be  defined  in  general  terms  of  functions,  skills, 
and  knowledges.  Similarly,  individual  qualifications  have  also  been 
assessed  in  global  terms,  such  as  total  MOS  proficiency  score,  training 
courses  completed,  or  time  in  grade.  With  the  technology  underlying  the 
SQT  program,  and  all  of  modern  instructional  technology,  both  job 
requirements  and  individual  qualifications  can  be  stated  more  precisely 
— critical  job  tasks  define  job  requirements,  and  performance  cn  these 
critical  tasks  defines  levels  of  proficiency. 

Finally,  research  personnel  may  have  to  reconceptualize  their 
function.  Traditionally,  test  psychologists  have  focused  their  efforts 
on  developing  statistical  techniques  for  improving  the  accuracy  of  test 
scores.  However,  'in  criterion-referenced  testing,  establishing  the 
content  of  a test  is  prerequisite  to,  and  therefore,  perhaps  even  more 
important  than  improving  the  accuracy  of  test  scores.  The  interpretation 
of  test  scores  in  criterion-referenced  testing  is  always  dependent  on 
being  able  to  provide  an  explicit  linkage  between  test  content  and  test 
scores.  Research  efforts  are  required  that  explore  and  define  the 
relationship  between  test  content  and  test  scores.  For  example,  there  is 
a need  for  research  on  development  of  score  scales  designed  to  reflect 
realistic  standards  of  performance. 

Because  of  the  need  to  establish  an  operational  testing  program  to 
meet  a tight  schedule,  some  decisions  were  made  that  appear  reasonable 
but  are  not  supported  by  an  existing  test  theory.  One  example  of  such  a 
decision  is  how  to  match  scores  from  different  tests.  SQTs  are  assumed 
to  be  of  equal  difficulty  and  relevance  to  all  job  incumbents,  which  is  a 
most  reasonable  assumption  given  the  current  state  of  the  art.  New 
theoretical  developments  are  required  to  develop  score  scales  that  can 
equate  scores  of  soldiers  tested  on  different  tasks.  A promising 
approach  is  available  in  latent  trait  theory,  which  addresses  many  of  the 
problems  faced  in  developing  SQTs.  The  applicability  of  latent  trait 
theory,  however,  has  not  yet  been  sufficient!;^  demonstrated  in  any 
large-scale  testing  program,  especially  one  confronted  with  the  limited 
resources  available  to  test  development  activities. 
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